10 research outputs found

    Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

    Full text link
    Research on ranked retrieval of spoken con-tent has assumed the existence of some auto-mated (word or phonetic) transcription. Re-cently, however, methods have been demon-strated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked re-trieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assess-ment was performed by other native speak-ers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, cou-pled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

    Neural Language Model Based Attentive Term Dependence Model for Verbose Query (Student Abstract)

    No full text
    The query-document term matching plays an important role in information retrieval. However, the retrieval performance degrades when the documents get matched with the extraneous terms of the query which frequently arises in verbose queries. To address this problem, we generate the dense vector of the entire query and individual query terms using the pre-trained BERT (Bidirectional Encoder Representations from Transformers) model and subsequently analyze their relation to focus on the central terms. We then propose a context-aware attentive extension of unsupervised Markov Random Field-based sequential term dependence model that explicitly pays more attention to those contextually central terms. The proposed model utilizes the strengths of the pre-trained large language model for estimating the attention weight of terms and rank the documents in a single pass without any supervision

    Software-performance evaluation

    No full text
    In some jurisdictions, parties to a lawsuit can request documents from each other, but documents subject to a claim of privilege may be withheld. The TREC 2010 Legal Track developed what is presently the only public test collection for evaluating privilege classification. This paper examines the reliability and reusability of that collection. For reliability, the key question is the extent to which privilege judgments correctly reflect the opinion of the senior litigator whose judgment is authoritative. For reusability, the key question is the degree to which systems whose results contributed to creation of the test collection can be fairly compared with other systems that use those privilege judgments in the future. These correspond to measurement error and sampling error, respectively. The results indicate that measurement error is the larger problem

    Overview of the FIRE 2011 RISOT Task

    No full text
    RISOT was a pilot task in FIRE 2011 which focused on the retrieval of automatically recognized text from machine printed sources. The collection used for search was a subset of the FIRE 2008 and 2010 Bengali test collections that contained 92 topics and 62,825 documents. Two teams participated, submitting a total of 11 monolingual runs. 1

    GRAS

    No full text

    The FIRE 2013 question answering for the spoken web task. Forum for Information Retrieval Evaluation

    No full text
    ABSTRACT The FIRE 2013 Question Answering for the Spoken Web (QASW) task was an information retrieval evaluation in which the goal was to match spoken Gujarati questions to spoken Gujarati answers. This paper describes the design of the task, the development of the test collection, the runs that were submitted, and the corresponding results

    Incremental blind feedback

    No full text

    A Fast Corpus-Based Stemmer

    No full text

    Effective and Robust Query-Based Stemming

    No full text
    corecore